Skip to content

Conversation

mattf
Copy link
Collaborator

@mattf mattf commented Sep 8, 2025

What does this PR do?

update VertexAI inference provider to use openai-python for openai-compat functions

Test Plan

$ VERTEX_AI_PROJECT=... uv run llama stack build --image-type venv --providers inference=remote::vertexai --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model vertexai/vertex_ai/gemini-2.5-flash tests/integration/inference/test_openai_completion.py
...

i don't have an account to test this. get_api_key may also need to be updated per https://cloud.google.com/vertex-ai/generative-ai/docs/start/openai

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 8, 2025
@mattf
Copy link
Collaborator Author

mattf commented Sep 8, 2025

@leseb here you go

mattf and others added 2 commits September 10, 2025 15:22
OpenAIMixin expects to use an API key and creates its own AsyncOpenAI
client. So our code now authenticate with the Google service, retrieves
a token and pass it to the OpenAI client.
Falls back to an empty string if credentials can't be obtained (letting
LiteLLM handle ADC directly).

Signed-off-by: Sébastien Han <[email protected]>
@leseb leseb force-pushed the use-openai-for-vertexai branch from 73e99b6 to b9961c8 Compare September 10, 2025 13:22
@leseb
Copy link
Collaborator

leseb commented Sep 10, 2025

Test plan:

GOOGLE_APPLICATION_CREDENTIALS=/Users/leseb/Documents/AI/llama-stack/service_account.json  VERTEX_AI_PROJECT=assisted-installer uv run llama stack build --image-type venv --providers inference=remote::vertexai --run

LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model vertexai/vertex_ai/gemini-2.5-flash tests/integration/inference/test_openai_completion.py
Uninstalled 1 package in 5ms
Installed 1 package in 2ms
============================================= test session starts ==============================================
platform darwin -- Python 3.12.8, pytest-8.4.1, pluggy-1.6.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.12.8', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.1', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0', 'hydra-core': '1.3.2'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0, hydra-core-1.3.2
asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 27 items                                                                                             

tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming[txt=vertexai/vertex_ai/gemini-2.5-flash-inference:completion:sanity] SKIPPED [  3%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_non_streaming_suffix[txt=vertexai/vertex_ai/gemini-2.5-flash-inference:completion:suffix] SKIPPED [  7%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_streaming[txt=vertexai/vertex_ai/gemini-2.5-flash-inference:completion:sanity] SKIPPED [ 11%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=vertexai/vertex_ai/gemini-2.5-flash-1] SKIPPED [ 14%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vertexai/vertex_ai/gemini-2.5-flash] SKIPPED [ 18%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:non_streaming_01] PASSED [ 22%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_01] PASSED [ 25%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_01] SKIPPED [ 29%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-True] PASSED [ 33%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-True] PASSED [ 37%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming_with_file[txt=vertexai/vertex_ai/gemini-2.5-flash] SKIPPED [ 40%]
tests/integration/inference/test_openai_completion.py::test_openai_completion_prompt_logprobs[txt=vertexai/vertex_ai/gemini-2.5-flash-0] SKIPPED [ 44%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:non_streaming_02] PASSED [ 48%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_02] PASSED [ 51%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_02] SKIPPED [ 55%]
tests/integration/inference/test_openai_completion.py::test_inference_store[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-False] PASSED [ 59%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[openai_client-txt=vertexai/vertex_ai/gemini-2.5-flash-False] PASSED [ 62%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:non_streaming_01] PASSED [ 66%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_01] PASSED [ 70%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_01] SKIPPED [ 74%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-True] PASSED [ 77%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-True] PASSED [ 81%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_non_streaming[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:non_streaming_02] PASSED [ 85%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_02] PASSED [ 88%]
tests/integration/inference/test_openai_completion.py::test_openai_chat_completion_streaming_with_n[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-inference:chat_completion:streaming_02] SKIPPED [ 92%]
tests/integration/inference/test_openai_completion.py::test_inference_store[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-False] PASSED [ 96%]
tests/integration/inference/test_openai_completion.py::test_inference_store_tool_calls[client_with_models-txt=vertexai/vertex_ai/gemini-2.5-flash-False] PASSED [100%]

=========================================== short test summary info ============================================
SKIPPED [3] tests/integration/inference/test_openai_completion.py:46: Model vertexai/vertex_ai/gemini-2.5-flash hosted by remote::vertexai doesn't support OpenAI completions.
SKIPPED [3] tests/integration/inference/test_openai_completion.py:104: Model vertexai/vertex_ai/gemini-2.5-flash hosted by remote::vertexai doesn't support vllm extra_body parameters.
SKIPPED [4] tests/integration/inference/test_openai_completion.py:83: Model vertexai/vertex_ai/gemini-2.5-flash hosted by remote::vertexai doesn't support n param.
SKIPPED [1] tests/integration/inference/test_openai_completion.py:110: Model vertexai/vertex_ai/gemini-2.5-flash hosted by remote::vertexai doesn't support chat completion calls with base64 encoded files.
================================= 16 passed, 11 skipped, 2 warnings in 31.95s ==================================

@mattf
Copy link
Collaborator Author

mattf commented Sep 10, 2025

@leseb lgtm. thanks for finishing it.

@leseb leseb merged commit 0e27016 into llamastack:main Sep 10, 2025
22 checks passed
iamemilio pushed a commit to iamemilio/llama-stack that referenced this pull request Sep 24, 2025
…enai-compat functions (llamastack#3377)

# What does this PR do?

update VertexAI inference provider to use openai-python for
openai-compat functions

## Test Plan

```
$ VERTEX_AI_PROJECT=... uv run llama stack build --image-type venv --providers inference=remote::vertexai --run
...
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run --group test pytest -v -ra --text-model vertexai/vertex_ai/gemini-2.5-flash tests/integration/inference/test_openai_completion.py
...
```

i don't have an account to test this. `get_api_key` may also need to be
updated per
https://cloud.google.com/vertex-ai/generative-ai/docs/start/openai

---------

Signed-off-by: Sébastien Han <[email protected]>
Co-authored-by: Sébastien Han <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants